{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9604dfa-9eb1-488f-8467-e695c11b3cd8",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --quiet neo4j langchain-community langchain-core langchain-openai langchain-text-splitters tiktoken wikipedia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5a291687-fe82-4380-95b8-263111ca3d9c",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/tomazbratanic/anaconda3/lib/python3.11/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).\n",
      "  from pandas.core import (\n"
     ]
    }
   ],
   "source": [
    "import asyncio\n",
    "import getpass\n",
    "import os\n",
    "from datetime import datetime\n",
    "from hashlib import md5\n",
    "from typing import Dict, List\n",
    "\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "import tiktoken\n",
    "from langchain_community.graphs import Neo4jGraph\n",
    "from langchain_community.tools import WikipediaQueryRun\n",
    "from langchain_community.utilities import WikipediaAPIWrapper\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_text_splitters import TokenTextSplitter\n",
    "from pydantic import BaseModel, Field"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e61e001a-38e5-4a00-97e7-d42a43c230b0",
   "metadata": {},
   "source": [
    "# Graph construction\n",
    "\n",
    "Use this notebook to construct a graph\n",
    "\n",
    "# Environment Setup\n",
    "You need to setup a Neo4j to follow along with the examples in this blog post. The easiest way is to start a free instance on Neo4j Aura, which offers cloud instances of Neo4j database. Alternatively, you can also setup a local instance of the Neo4j database by downloading the Neo4j Desktop application and creating a local database instance.\n",
    "\n",
    "The following code will instantiate a LangChain wrapper to connect to Neo4j Database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1c0e2843-cf25-4c05-b5f7-3958ae515041",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
    "os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
    "os.environ[\"NEO4J_PASSWORD\"] = \"password\"\n",
    "\n",
    "graph = Neo4jGraph(refresh_schema=False)\n",
    "\n",
    "graph.query(\"CREATE CONSTRAINT IF NOT EXISTS FOR (c:Chunk) REQUIRE c.id IS UNIQUE\")\n",
    "graph.query(\"CREATE CONSTRAINT IF NOT EXISTS FOR (c:AtomicFact) REQUIRE c.id IS UNIQUE\")\n",
    "graph.query(\"CREATE CONSTRAINT IF NOT EXISTS FOR (c:KeyElement) REQUIRE c.id IS UNIQUE\")\n",
    "graph.query(\"CREATE CONSTRAINT IF NOT EXISTS FOR (d:Document) REQUIRE d.id IS UNIQUE\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10823ead-5811-4e8f-bf66-c79b166fc00b",
   "metadata": {},
   "source": [
    "Additionally, we have also added constraints for the node types we will be using. The constraints ensure faster import and retrieval performance.\n",
    "\n",
    "You will require an OpenAI api key that you pass in the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4edbe031-daf6-44d0-8812-5611c404c703",
   "metadata": {},
   "outputs": [
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "OpenAI API Key: ········\n"
     ]
    }
   ],
   "source": [
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f87afc85-cfb8-4cba-bc85-ab930329fa75",
   "metadata": {},
   "source": [
    "We will be using the Joan of Arc Wikipedia page in this example. We will use LangChain built-in utility to retrieve the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "02db243b-8edb-442e-b7bf-2d8f860d4be8",
   "metadata": {},
   "outputs": [],
   "source": [
    "wikipedia = WikipediaQueryRun(\n",
    "    api_wrapper=WikipediaAPIWrapper(doc_content_chars_max=10000)\n",
    ")\n",
    "text = wikipedia.run(\"Joan of Arc\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c928429-6c5c-4385-9b41-afe2558192aa",
   "metadata": {},
   "source": [
    "As mentioned before, the GraphReader agent expects knowledge graph that contains chunks, related atomic facts, and key elements.\n",
    "\n",
    "![image](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*ZU6kh8gAMkQjUiUTgaNFPQ.png)\n",
    "\n",
    "First, the document is split into chunks. In the paper they maintained paragraph structure while chunking. However, that is hard to do in a generic way. Therefore, we will use naive chunking here.\n",
    "\n",
    "Next, each chunk is processed by the LLM to identify atomic facts, which are the smallest, indivisible units of information that capture core details. For instance, from the sentence “The CEO of Neo4j, which is in Sweden, is Emil Eifrem” an atomic fact could be broken down into something like “The CEO of Neo4j is Emil Eifrem.” and “Neo4j is in Sweden.” Each atomic fact is focused on one clear, standalone piece of information.\n",
    "\n",
    "From these atomic facts, key elements are identified. For the first fact, “The CEO of Neo4j is Emil Eifrem,” the key elements would be “CEO,” “Neo4j,” and “Emil Eifrem.” For the second fact, “Neo4j is in Sweden,” the key elements would be “Neo4j” and “Sweden.” These key elements are the essential nouns and proper names that capture the core meaning of each atomic fact.\n",
    "\n",
    "The prompt used to extract the graph are provided in the appendix of the paper.\n",
    "\n",
    "![image](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*U2K7VoON6thak0TeQq2svw.png)\n",
    "\n",
    "The authors used prompt-based extraction, where you instruct the LLM what it should output and then implement a function that parses the information in a structured manner. My preference for extracting structured information is to use the with_structured_output method in LangChain, which utilizes the tools feature to extract structured information. This way, we can skip defining a custom parsing function.\n",
    "\n",
    "Here is the prompt that we can use for extraction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "811b7eca-b443-4033-8989-70ad5d181e4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "construction_system = \"\"\"\n",
    "You are now an intelligent assistant tasked with meticulously extracting both key elements and\n",
    "atomic facts from a long text.\n",
    "1. Key Elements: The essential nouns (e.g., characters, times, events, places, numbers), verbs (e.g.,\n",
    "actions), and adjectives (e.g., states, feelings) that are pivotal to the text’s narrative.\n",
    "2. Atomic Facts: The smallest, indivisible facts, presented as concise sentences. These include\n",
    "propositions, theories, existences, concepts, and implicit elements like logic, causality, event\n",
    "sequences, interpersonal relationships, timelines, etc.\n",
    "Requirements:\n",
    "#####\n",
    "1. Ensure that all identified key elements are reflected within the corresponding atomic facts.\n",
    "2. You should extract key elements and atomic facts comprehensively, especially those that are\n",
    "important and potentially query-worthy and do not leave out details.\n",
    "3. Whenever applicable, replace pronouns with their specific noun counterparts (e.g., change I, He,\n",
    "She to actual names).\n",
    "4. Ensure that the key elements and atomic facts you extract are presented in the same language as\n",
    "the original text (e.g., English or Chinese).\n",
    "\"\"\"\n",
    "\n",
    "construction_human = \"\"\"Use the given format to extract information from the \n",
    "following input: {input}\"\"\"\n",
    "\n",
    "construction_prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\n",
    "            \"system\",\n",
    "            construction_system,\n",
    "        ),\n",
    "        (\n",
    "            \"human\",\n",
    "            (\n",
    "                \"Use the given format to extract information from the \"\n",
    "                \"following input: {input}\"\n",
    "            ),\n",
    "        ),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecf35a6c-c011-4a5c-bead-39a090f6ca13",
   "metadata": {},
   "source": [
    "We have put the instruction in the system prompt, and then in the user message we provide relevant text chunks that need to be processed.\n",
    "\n",
    "To define the desired output, we can use the Pydantic object definition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3af12e97-edca-4aac-a00e-e6ca0411d26e",
   "metadata": {},
   "outputs": [],
   "source": [
    "class AtomicFact(BaseModel):\n",
    "    key_elements: List[str] = Field(description=\"\"\"The essential nouns (e.g., characters, times, events, places, numbers), verbs (e.g.,\n",
    "actions), and adjectives (e.g., states, feelings) that are pivotal to the atomic fact's narrative.\"\"\")\n",
    "    atomic_fact: str = Field(description=\"\"\"The smallest, indivisible facts, presented as concise sentences. These include\n",
    "propositions, theories, existences, concepts, and implicit elements like logic, causality, event\n",
    "sequences, interpersonal relationships, timelines, etc.\"\"\")\n",
    "\n",
    "class Extraction(BaseModel):\n",
    "    atomic_facts: List[AtomicFact] = Field(description=\"List of atomic facts\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad295701-4929-491d-ba3f-5d9d04dd82fd",
   "metadata": {},
   "source": [
    "We want to extract a list of atomic facts, where each atomic fact contains a string field with the fact, and a list of present key elements. It is important to add description to each element to get the best results.\n",
    "\n",
    "Now we can combine it all in a chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0276399c-c43b-41ea-b5ad-374a747168d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ChatOpenAI(model=\"gpt-4o-2024-08-06\", temperature=0.1)\n",
    "structured_llm = model.with_structured_output(Extraction)\n",
    "\n",
    "construction_chain = construction_prompt | structured_llm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f16717f9-4d4b-4857-9ef3-6edccdd4ac4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import_query = \"\"\"\n",
    "MERGE (d:Document {id:$document_name})\n",
    "WITH d\n",
    "UNWIND $data AS row\n",
    "MERGE (c:Chunk {id: row.chunk_id})\n",
    "SET c.text = row.chunk_text,\n",
    "    c.index = row.index,\n",
    "    c.document_name = row.document_name\n",
    "MERGE (d)-[:HAS_CHUNK]->(c)\n",
    "WITH c, row\n",
    "UNWIND row.atomic_facts AS af\n",
    "MERGE (a:AtomicFact {id: af.id})\n",
    "SET a.text = af.atomic_fact\n",
    "MERGE (c)-[:HAS_ATOMIC_FACT]->(a)\n",
    "WITH c, a, af\n",
    "UNWIND af.key_elements AS ke\n",
    "MERGE (k:KeyElement {id: ke})\n",
    "MERGE (a)-[:HAS_KEY_ELEMENT]->(k)\n",
    "\"\"\"\n",
    "\n",
    "def encode_md5(text):\n",
    "    return md5(text.encode(\"utf-8\")).hexdigest()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5428dfc8-99d5-4730-a2a8-a1f3e1235753",
   "metadata": {},
   "source": [
    "To put it all together, we’ll create a function that takes a single document, chunks it, extracts atomic facts and key elements, and stores the results into Neo4j."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "dd7fd334-1280-4080-b14b-1f3a6187df93",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Paper used 2k token size\n",
    "async def process_document(text, document_name, chunk_size=2000, chunk_overlap=200):\n",
    "    start = datetime.now()\n",
    "    print(f\"Started extraction at: {start}\")\n",
    "    text_splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)\n",
    "    texts = text_splitter.split_text(text)\n",
    "    print(f\"Total text chunks: {len(texts)}\")\n",
    "    tasks = [\n",
    "        asyncio.create_task(construction_chain.ainvoke({\"input\":chunk_text}))\n",
    "        for index, chunk_text in enumerate(texts)\n",
    "    ]\n",
    "    results = await asyncio.gather(*tasks)\n",
    "    print(f\"Finished LLM extraction after: {datetime.now() - start}\")\n",
    "    docs = [el.dict() for el in results]\n",
    "    for index, doc in enumerate(docs):\n",
    "        doc['chunk_id'] = encode_md5(texts[index])\n",
    "        doc['chunk_text'] = texts[index]\n",
    "        doc['index'] = index\n",
    "        for af in doc[\"atomic_facts\"]:\n",
    "            af[\"id\"] = encode_md5(af[\"atomic_fact\"])\n",
    "    # Import chunks/atomic facts/key elements\n",
    "    graph.query(import_query, \n",
    "            params={\"data\": docs, \"document_name\": document_name})\n",
    "    # Create next relationships between chunks\n",
    "    graph.query(\"\"\"MATCH (c:Chunk)<-[:HAS_CHUNK]-(d:Document)\n",
    "WHERE d.id = $document_name\n",
    "WITH c ORDER BY c.index WITH collect(c) AS nodes\n",
    "UNWIND range(0, size(nodes) -2) AS index\n",
    "WITH nodes[index] AS start, nodes[index + 1] AS end\n",
    "MERGE (start)-[:NEXT]->(end)\n",
    "\"\"\",\n",
    "           params={\"document_name\":document_name})\n",
    "    print(f\"Finished import at: {datetime.now() - start}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b3a91be-1c2a-41b7-a01f-5c59b36c38f3",
   "metadata": {},
   "source": [
    "At a high level, this code processes a document by breaking it into chunks, extracting information from each chunk using an AI model, and storing the results in a graph database. Here’s a summary:\n",
    "\n",
    "1. It splits the document text into chunks of a specified size, allowing for some overlap. The chunk size of 2000 tokens is used by the authors in the paper.\n",
    "2. For each chunk, it asynchronously sends the text to an LLM for extraction of atomic facts and key elements.\n",
    "3. Each chunk and fact is given a unique identifier using an md5 encoding function.\n",
    "4. The processed data is imported into a graph database, with relationships established between consecutive chunks.\n",
    "\n",
    "We can now run this function on our Joan of Arc text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "257853dd-f0fd-4de0-8517-20a080a62cbd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started extraction at: 2024-09-21 10:36:57.609347\n",
      "Total text chunks: 4\n",
      "Finished LLM extraction after: 0:00:12.036753\n",
      "Finished import at: 0:00:12.132462\n"
     ]
    }
   ],
   "source": [
    "await process_document(text, \"Joan of Arc\", chunk_size=500, chunk_overlap=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7685692c-692c-4228-9798-5b9f1d68ef90",
   "metadata": {},
   "source": [
    "We used a smaller chunk size because it’s a small document, and we want to have a couple of chunks for demonstration purposes. If you explore the graph in Neo4j Browser, you should see a similar visualization.\n",
    "\n",
    "![image](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*dTypf1s6rKLFBajeKixeQQ.png)\n",
    "\n",
    "At the center of the structure is the document node (blue), which branches out to chunk nodes (pink). These chunk nodes, in turn, are linked to atomic facts (orange), each of which connects to key elements (green).\n",
    "\n",
    "Let’s examine the constructed graph a bit. We’ll start of by examining the token count distribution of atomic facts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "cb38cc85-f260-4ed2-9646-b3cd858a3baf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Axes: xlabel='tokens', ylabel='Count'>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAGwCAYAAACzXI8XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/TGe4hAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAlgklEQVR4nO3df1jV9d3H8ddR4KgJKKIICYJmoKjYD/Ni3nfTZP6YOV3bvVrayHa7VaQZrVtZqeG9jeq656zJtNbM7Zpa+5HWujc3U8B+kAlEyi418QYhBRk2OYJyJPjef9yX5w4FVDzwPZ/D83Fd3+vyfL/fA++Pn3n13OEADsuyLAEAABiol90DAAAAdBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjBdg9QFdraWnRiRMnFBwcLIfDYfc4AADgCliWpTNnzigqKkq9erX/uovfh8yJEycUHR1t9xgAAKATKisrNWzYsHav+33IBAcHS/q/v4iQkBCbpwEAAFfC5XIpOjra89/x9vh9yFz4clJISAghAwCAYS73thDe7AsAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMF2D2AySoqKlRbW2v3GLYKDw9XTEyM3WMAAHooQqaTKioqlJAwWufOnbV7FFv17dtPhw4dJGYAALYgZDqptrZW586d1aQHVikkMtbucWzhqirX3o2Zqq2tJWQAALYgZK5RSGSswmLi7R4DAIAeiTf7AgAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABj2Roye/bs0Zw5cxQVFSWHw6Ht27e3e++DDz4oh8OhtWvXdtt8AADAt9kaMg0NDUpKSlJ2dnaH923btk0ffPCBoqKiumkyAABgggA7P/msWbM0a9asDu85fvy4Fi9erL/+9a+aPXv2ZT+m2+2W2+32PHa5XNc8JwAA8E0+/R6ZlpYW3XfffXriiSeUmJh4Rc/JyspSaGio54iOju7iKQEAgF18OmSeffZZBQQEaMmSJVf8nIyMDNXV1XmOysrKLpwQAADYydYvLXWksLBQzz//vIqKiuRwOK74eU6nU06nswsnAwAAvsJnX5F55513VFNTo5iYGAUEBCggIEDHjh3T448/rtjYWLvHAwAAPsBnX5G57777lJKS0urcjBkzdN9992nhwoU2TQUAAHyJrSFTX1+v0tJSz+OysjIVFxcrLCxMMTExGjRoUKv7AwMDNXToUMXHx3f3qAAAwAfZGjIFBQWaOnWq53F6erokKTU1VZs2bbJpKgAAYApbQ2bKlCmyLOuK7y8vL++6YQAAgHF89s2+AAAAl0PIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMJatIbNnzx7NmTNHUVFRcjgc2r59u+daU1OTli1bpnHjxum6665TVFSUvvOd7+jEiRP2DQwAAHyKrSHT0NCgpKQkZWdnX3Lt7NmzKioq0ooVK1RUVKTXX39dhw8f1te+9jUbJgUAAL4owM5PPmvWLM2aNavNa6Ghodq5c2erc+vWrdNtt92miooKxcTEtPk8t9stt9vteexyubw3MNp08OBBu0ewjdvtltPptHsMW4WHh7f77xEAupqtIXO16urq5HA4NGDAgHbvycrKUmZmZvcN1YOdqzslyaEFCxbYPYp9HA7JsuyewlZ9+/bToUMHiRkAtjAmZBobG7Vs2TJ9+9vfVkhISLv3ZWRkKD093fPY5XIpOjq6O0bscZrOnpFkacK9yzQ4LsHucbpd1YF8lbz5Uo9dvyS5qsq1d2OmamtrCRkAtjAiZJqamvStb31LlmVp/fr1Hd7rdDp7/Ev93a3/kBiFxcTbPUa3c1WVS+q56wcAX+DzIXMhYo4dO6bdu3d3+GoMAADoWXw6ZC5EzJEjR5STk6NBgwbZPRIAAPAhtoZMfX29SktLPY/LyspUXFyssLAwRUZG6pvf/KaKior01ltvqbm5WdXV1ZKksLAwBQUF2TU2AADwEbaGTEFBgaZOnep5fOFNuqmpqXr66af15ptvSpImTJjQ6nk5OTmaMmVKd40JAAB8lK0hM2XKFFkdfOtqR9cAAAD4XUsAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMZWvI7NmzR3PmzFFUVJQcDoe2b9/e6rplWVq5cqUiIyPVt29fpaSk6MiRI/YMCwAAfI6tIdPQ0KCkpCRlZ2e3ef25557TCy+8oA0bNmjv3r267rrrNGPGDDU2NnbzpAAAwBcF2PnJZ82apVmzZrV5zbIsrV27Vk899ZTmzp0rSfrNb36jiIgIbd++Xffcc093jgoAAHyQz75HpqysTNXV1UpJSfGcCw0N1aRJk5Sfn9/u89xut1wuV6sDAAD4J58NmerqaklSREREq/MRERGea23JyspSaGio54iOju7SOQEAgH18NmQ6KyMjQ3V1dZ6jsrLS7pEAAEAX8dmQGTp0qCTp5MmTrc6fPHnSc60tTqdTISEhrQ4AAOCffDZk4uLiNHToUO3atctzzuVyae/evUpOTrZxMgAA4Cts/a6l+vp6lZaWeh6XlZWpuLhYYWFhiomJ0dKlS/WjH/1Io0aNUlxcnFasWKGoqCjNmzfPvqEBAIDPsDVkCgoKNHXqVM/j9PR0SVJqaqo2bdqk//iP/1BDQ4O+973v6fTp0/qXf/kX7dixQ3369LFrZAAA4ENsDZkpU6bIsqx2rzscDq1evVqrV6/uxqkAAIApfPY9MgAAAJdDyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIzVqZAZMWKETp06dcn506dPa8SIEdc8FAAAwJXoVMiUl5erubn5kvNut1vHjx+/5qEAAACuRMDV3Pzmm296/vzXv/5VoaGhnsfNzc3atWuXYmNjvTYcAABAR64qZObNmydJcjgcSk1NbXUtMDBQsbGx+ulPf+q14QAAADpyVSHT0tIiSYqLi9O+ffsUHh7eJUMBAABciasKmQvKysq8PQcAAMBV61TISNKuXbu0a9cu1dTUeF6puWDjxo3XPBgAAMDldCpkMjMztXr1at16662KjIyUw+Hw9lwAAACX1amQ2bBhgzZt2qT77rvP2/MAAABcsU79HJnz58/rS1/6krdnAQAAuCqdCpl///d/15YtW7w9CwAAwFXp1JeWGhsb9dJLL+ntt9/W+PHjFRgY2Or6mjVrvDJcc3Oznn76af32t79VdXW1oqKidP/99+upp57ifTkAAKBzIbN//35NmDBBklRSUtLqmjcD49lnn9X69ev161//WomJiSooKNDChQsVGhqqJUuWeO3zAAAAM3UqZHJycrw9R5vef/99zZ07V7Nnz5YkxcbGauvWrfrwww+75fMDAADf1umfI9MdvvSlL+mll17SJ598ohtvvFEff/yx3n333Q6/dOV2u+V2uz2PXS5Xd4wK9GgHDx60ewTbhIeHKyYmxu4xgB6rUyEzderUDr+EtHv37k4P9EXLly+Xy+VSQkKCevfurebmZv34xz/W/Pnz231OVlaWMjMzvfL5AXTsXN0pSQ4tWLDA7lFs07dvPx06dJCYAWzSqZC58P6YC5qamlRcXKySkpJLfpnktfjd736nzZs3a8uWLUpMTFRxcbGWLl2qqKiodj9PRkaG0tPTPY9dLpeio6O9NhOA/9d09owkSxPuXabBcQl2j9PtXFXl2rsxU7W1tYQMYJNOhczPfvazNs8//fTTqq+vv6aBvuiJJ57Q8uXLdc8990iSxo0bp2PHjikrK6vdkHE6nXI6nV6bAcDl9R8So7CYeLvHANADdernyLRnwYIFXv09S2fPnlWvXq1H7N279yW/2wkAAPRMXn2zb35+vvr06eO1jzdnzhz9+Mc/VkxMjBITE/XRRx9pzZo1euCBB7z2OQAAgLk6FTJ33XVXq8eWZamqqkoFBQVasWKFVwaTpJ///OdasWKFHn74YdXU1CgqKkrf//73tXLlSq99DgAAYK5OhUxoaGirx7169VJ8fLxWr16t6dOne2UwSQoODtbatWu1du1ar31MAADgPzoVMq+88oq35wAAALhq1/QemcLCQs8PwkpMTNRNN93klaEAAACuRKdCpqamRvfcc49yc3M1YMAASdLp06c1depUvfrqqxo8eLA3ZwQAAGhTp779evHixTpz5oz+/ve/67PPPtNnn32mkpISuVwufpkjAADoNp16RWbHjh16++23NXr0aM+5MWPGKDs726tv9gUAAOhIp16RaWlpUWBg4CXnAwMD+WF1AACg23QqZO644w49+uijOnHihOfc8ePH9dhjj2natGleGw4AAKAjnQqZdevWyeVyKTY2ViNHjtTIkSMVFxcnl8uln//8596eEQAAoE2deo9MdHS0ioqK9Pbbb+vQoUOSpNGjRyslJcWrwwEAAHTkql6R2b17t8aMGSOXyyWHw6GvfOUrWrx4sRYvXqyJEycqMTFR77zzTlfNCgAA0MpVhczatWu1aNEihYSEXHItNDRU3//+97VmzRqvDQcAANCRqwqZjz/+WDNnzmz3+vTp01VYWHjNQwEAAFyJqwqZkydPtvlt1xcEBAToH//4xzUPBQAAcCWuKmSuv/56lZSUtHt9//79ioyMvOahAAAArsRVhcxXv/pVrVixQo2NjZdcO3funFatWqU777zTa8MBAAB05Kq+/fqpp57S66+/rhtvvFGPPPKI4uPjJUmHDh1Sdna2mpub9eSTT3bJoAAAABe7qpCJiIjQ+++/r4ceekgZGRmyLEuS5HA4NGPGDGVnZysiIqJLBgUAALjYVf9AvOHDh+vPf/6z/vnPf6q0tFSWZWnUqFEaOHBgV8wHAADQrk79ZF9JGjhwoCZOnOjNWQAAAK5Kp37XEgAAgC8gZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxvL5kDl+/LgWLFigQYMGqW/fvho3bpwKCgrsHgsAAPiAALsH6Mg///lPTZ48WVOnTtVf/vIXDR48WEeOHNHAgQPtHg0AAPgAnw6ZZ599VtHR0XrllVc85+Li4mycCAAA+BKfDpk333xTM2bM0L/9278pLy9P119/vR5++GEtWrSo3ee43W653W7PY5fL1R2jAujBDh48aPcItgoPD1dMTIzdY6CH8umQ+Z//+R+tX79e6enp+uEPf6h9+/ZpyZIlCgoKUmpqapvPycrKUmZmZjdPCqAnOld3SpJDCxYssHsUW/Xt20+HDh0kZmALnw6ZlpYW3XrrrfrJT34iSbrppptUUlKiDRs2tBsyGRkZSk9P9zx2uVyKjo7ulnkB9CxNZ89IsjTh3mUaHJdg9zi2cFWVa+/GTNXW1hIysIVPh0xkZKTGjBnT6tzo0aP1xz/+sd3nOJ1OOZ3Orh4NADz6D4lRWEy83WMAPZJPf/v15MmTdfjw4VbnPvnkEw0fPtymiQAAgC/x6ZB57LHH9MEHH+gnP/mJSktLtWXLFr300ktKS0uzezQAAOADfDpkJk6cqG3btmnr1q0aO3as/vM//1Nr167V/Pnz7R4NAAD4AJ9+j4wk3XnnnbrzzjvtHgMAAPggn35FBgAAoCOEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMZFTLPPPOMHA6Hli5davcoAADABxgTMvv27dOLL76o8ePH2z0KAADwEUaETH19vebPn69f/vKXGjhwoN3jAAAAH2FEyKSlpWn27NlKSUm57L1ut1sul6vVAQAA/FOA3QNczquvvqqioiLt27fviu7PyspSZmZmF08FAAB8gU+/IlNZWalHH31UmzdvVp8+fa7oORkZGaqrq/MclZWVXTwlAACwi0+/IlNYWKiamhrdfPPNnnPNzc3as2eP1q1bJ7fbrd69e7d6jtPplNPp7O5RAQCADXw6ZKZNm6YDBw60Ordw4UIlJCRo2bJll0QMAADoWXw6ZIKDgzV27NhW56677joNGjTokvMAAKDn8en3yAAAAHTEp1+RaUtubq7dIwAAAB/BKzIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIwVYPcAAACYrKKiQrW1tXaPYZvw8HDFxMTY9vkJGQAAOqmiokIJCaN17txZu0exTd++/XTo0EHbYoaQAQCgk2pra3Xu3FlNemCVQiJj7R6n27mqyrV3Y6Zqa2sJGQAATBUSGauwmHi7x+iReLMvAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADCWT4dMVlaWJk6cqODgYA0ZMkTz5s3T4cOH7R4LAAD4CJ8Omby8PKWlpemDDz7Qzp071dTUpOnTp6uhocHu0QAAgA8IsHuAjuzYsaPV402bNmnIkCEqLCzU7bffbtNUAADAV/h0yFysrq5OkhQWFtbuPW63W2632/PY5XJ1+VwA0NMdPHjQ7hFs0VPX7UuMCZmWlhYtXbpUkydP1tixY9u9LysrS5mZmd04GQD0XOfqTklyaMGCBXaPYqsm93m7R+ixjAmZtLQ0lZSU6N133+3wvoyMDKWnp3seu1wuRUdHd/V4ANAjNZ09I8nShHuXaXBcgt3jdLuqA/kqefMlff7553aP0mMZETKPPPKI3nrrLe3Zs0fDhg3r8F6n0ymn09lNkwEAJKn/kBiFxcTbPUa3c1WV2z1Cj+fTIWNZlhYvXqxt27YpNzdXcXFxdo8EAAB8iE+HTFpamrZs2aI33nhDwcHBqq6uliSFhoaqb9++Nk8HAADs5tM/R2b9+vWqq6vTlClTFBkZ6Tlee+01u0cDAAA+wKdfkbEsy+4RAACAD/PpV2QAAAA6QsgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwlhEhk52drdjYWPXp00eTJk3Shx9+aPdIAADAB/h8yLz22mtKT0/XqlWrVFRUpKSkJM2YMUM1NTV2jwYAAGzm8yGzZs0aLVq0SAsXLtSYMWO0YcMG9evXTxs3brR7NAAAYLMAuwfoyPnz51VYWKiMjAzPuV69eiklJUX5+fltPsftdsvtdnse19XVSZJcLpdXZ6uvr5ckfXbssD53n/PqxzaFq+qYJKnu+BEFBjhsnqb79fT1S/wd9PT1S/wd9Pj1V1dI+r//Jnr7v7MXPp5lWR3faPmw48ePW5Ks999/v9X5J554wrrtttvafM6qVassSRwcHBwcHBx+cFRWVnbYCj79ikxnZGRkKD093fO4paVFn332mQYNGiSHw75adrlcio6OVmVlpUJCQmybo7v0pPWyVv/Vk9bLWv2Xqeu1LEtnzpxRVFRUh/f5dMiEh4erd+/eOnnyZKvzJ0+e1NChQ9t8jtPplNPpbHVuwIABXTXiVQsJCTHqf0jXqietl7X6r560Xtbqv0xcb2ho6GXv8ek3+wYFBemWW27Rrl27POdaWlq0a9cuJScn2zgZAADwBT79iowkpaenKzU1Vbfeeqtuu+02rV27Vg0NDVq4cKHdowEAAJv5fMjcfffd+sc//qGVK1equrpaEyZM0I4dOxQREWH3aFfF6XRq1apVl3zZy1/1pPWyVv/Vk9bLWv2Xv6/XYVmX+74mAAAA3+TT75EBAADoCCEDAACMRcgAAABjETIAAMBYhIyX7dmzR3PmzFFUVJQcDoe2b9/e6vr9998vh8PR6pg5c6Y9w16jrKwsTZw4UcHBwRoyZIjmzZunw4cPt7qnsbFRaWlpGjRokPr3769vfOMbl/yAQxNcyVqnTJlyyd4++OCDNk18bdavX6/x48d7foBWcnKy/vKXv3iu+8u+Spdfqz/t68WeeeYZORwOLV261HPOn/b2i9paqz/t7dNPP33JWhISEjzX/XVfJULG6xoaGpSUlKTs7Ox275k5c6aqqqo8x9atW7txQu/Jy8tTWlqaPvjgA+3cuVNNTU2aPn26GhoaPPc89thj+tOf/qTf//73ysvL04kTJ3TXXXfZOHXnXMlaJWnRokWt9va5556zaeJrM2zYMD3zzDMqLCxUQUGB7rjjDs2dO1d///vfJfnPvkqXX6vkP/v6Rfv27dOLL76o8ePHtzrvT3t7QXtrlfxrbxMTE1ut5d133/Vc88d99fDOr3dEWyRZ27Zta3UuNTXVmjt3ri3zdLWamhpLkpWXl2dZlmWdPn3aCgwMtH7/+9977jl48KAlycrPz7drTK+4eK2WZVlf/vKXrUcffdS+obrYwIEDrZdfftmv9/WCC2u1LP/c1zNnzlijRo2ydu7c2Wp9/ri37a3Vsvxrb1etWmUlJSW1ec0f9/WLeEXGBrm5uRoyZIji4+P10EMP6dSpU3aP5BV1dXWSpLCwMElSYWGhmpqalJKS4rknISFBMTExys/Pt2VGb7l4rRds3rxZ4eHhGjt2rDIyMnT27Fk7xvOq5uZmvfrqq2poaFBycrJf7+vFa73A3/Y1LS1Ns2fPbrWHkn/+m21vrRf4094eOXJEUVFRGjFihObPn6+KigpJ/rmvX+TzP9nX38ycOVN33XWX4uLidPToUf3whz/UrFmzlJ+fr969e9s9Xqe1tLRo6dKlmjx5ssaOHStJqq6uVlBQ0CW/tDMiIkLV1dU2TOkdba1Vku69914NHz5cUVFR2r9/v5YtW6bDhw/r9ddft3Hazjtw4ICSk5PV2Nio/v37a9u2bRozZoyKi4v9bl/bW6vkf/v66quvqqioSPv27bvkmr/9m+1orZJ/7e2kSZO0adMmxcfHq6qqSpmZmfrXf/1XlZSU+N2+XoyQ6Wb33HOP58/jxo3T+PHjNXLkSOXm5mratGk2TnZt0tLSVFJS0uprsv6qvbV+73vf8/x53LhxioyM1LRp03T06FGNHDmyu8e8ZvHx8SouLlZdXZ3+8Ic/KDU1VXl5eXaP1SXaW+uYMWP8al8rKyv16KOPaufOnerTp4/d43SpK1mrP+3trFmzPH8eP368Jk2apOHDh+t3v/ud+vbta+NkXY8vLdlsxIgRCg8PV2lpqd2jdNojjzyit956Szk5ORo2bJjn/NChQ3X+/HmdPn261f0nT57U0KFDu3lK72hvrW2ZNGmSJBm7t0FBQbrhhht0yy23KCsrS0lJSXr++ef9cl/bW2tbTN7XwsJC1dTU6Oabb1ZAQIACAgKUl5enF154QQEBAYqIiPCbvb3cWpubmy95jsl7e7EBAwboxhtvVGlpqV/+m/0iQsZmn376qU6dOqXIyEi7R7lqlmXpkUce0bZt27R7927FxcW1un7LLbcoMDBQu3bt8pw7fPiwKioqWr3/wASXW2tbiouLJcnIvW1LS0uL3G63X+1rey6stS0m7+u0adN04MABFRcXe45bb71V8+fP9/zZX/b2cmtt60v5Ju/txerr63X06FFFRkb6/79Zu99t7G/OnDljffTRR9ZHH31kSbLWrFljffTRR9axY8esM2fOWD/4wQ+s/Px8q6yszHr77betm2++2Ro1apTV2Nho9+hX7aGHHrJCQ0Ot3Nxcq6qqynOcPXvWc8+DDz5oxcTEWLt377YKCgqs5ORkKzk52capO+dyay0tLbVWr15tFRQUWGVlZdYbb7xhjRgxwrr99tttnrxzli9fbuXl5VllZWXW/v37reXLl1sOh8P629/+ZlmW/+yrZXW8Vn/b17Zc/J07/rS3F/viWv1tbx9//HErNzfXKisrs9577z0rJSXFCg8Pt2pqaizL8u99JWS8LCcnx5J0yZGammqdPXvWmj59ujV48GArMDDQGj58uLVo0SKrurra7rE7pa11SrJeeeUVzz3nzp2zHn74YWvgwIFWv379rK9//etWVVWVfUN30uXWWlFRYd1+++1WWFiY5XQ6rRtuuMF64oknrLq6OnsH76QHHnjAGj58uBUUFGQNHjzYmjZtmidiLMt/9tWyOl6rv+1rWy4OGX/a24t9ca3+trd33323FRkZaQUFBVnXX3+9dffdd1ulpaWe6/68rw7LsqzufhUIAADAG3iPDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAnzVlyhQtXbrU7jEA+DBCBkC3IEoAdAVCBgAAGIuQAdDl7r//fuXl5en555+Xw+GQw+FQeXm58vLydNttt8npdCoyMlLLly/X559/3u7H+e///m+FhoZq8+bNkqTKykp961vf0oABAxQWFqa5c+eqvLy81eedN2+e/uu//kuRkZEaNGiQ0tLS1NTU5LnnF7/4hUaNGqU+ffooIiJC3/zmN7vs7wGA9xEyALrc888/r+TkZC1atEhVVVWqqqpSYGCgvvrVr2rixIn6+OOPtX79ev3qV7/Sj370ozY/xpYtW/Ttb39bmzdv1vz589XU1KQZM2YoODhY77zzjt577z31799fM2fO1Pnz5z3Py8nJ0dGjR5WTk6Nf//rX2rRpkzZt2iRJKigo0JIlS7R69WodPnxYO3bs0O23394dfyUAvCTA7gEA+L/Q0FAFBQWpX79+Gjp0qCTpySefVHR0tNatWyeHw6GEhASdOHFCy5Yt08qVK9Wr1////6zs7Gw9+eST+tOf/qQvf/nLkqTXXntNLS0tevnll+VwOCRJr7zyigYMGKDc3FxNnz5dkjRw4ECtW7dOvXv3VkJCgmbPnq1du3Zp0aJFqqio0HXXXac777xTwcHBGj58uG666aZu/tsBcC0IGQC2OHjwoJKTkz0RIkmTJ09WfX29Pv30U8XExEiS/vCHP6impkbvvfeeJk6c6Ln3448/VmlpqYKDg1t93MbGRh09etTzODExUb179/Y8joyM1IEDByRJX/nKVzR8+HCNGDFCM2fO1MyZM/X1r39d/fr165I1A/A+vrQEwKfddNNNGjx4sDZu3CjLsjzn6+vrdcstt6i4uLjV8cknn+jee+/13BcYGNjq4zkcDrW0tEiSgoODVVRUpK1btyoyMlIrV65UUlKSTp8+3S1rA3DtCBkA3SIoKEjNzc2ex6NHj1Z+fn6rOHnvvfcUHBysYcOGec6NHDlSOTk5euONN7R48WLP+ZtvvllHjhzRkCFDdMMNN7Q6QkNDr3iugIAApaSk6LnnntP+/ftVXl6u3bt3X+NqAXQXQgZAt4iNjdXevXtVXl6u2tpaPfzww6qsrNTixYt16NAhvfHGG1q1apXS09NbvT9Gkm688Ubl5OToj3/8o+dn0cyfP1/h4eGaO3eu3nnnHZWVlSk3N1dLlizRp59+ekUzvfXWW3rhhRdUXFysY8eO6Te/+Y1aWloUHx/v7eUD6CKEDIBu8YMf/EC9e/fWmDFjNHjwYDU1NenPf/6zPvzwQyUlJenBBx/Ud7/7XT311FNtPj8+Pl67d+/W1q1b9fjjj6tfv37as2ePYmJidNddd2n06NH67ne/q8bGRoWEhFzRTAMGDNDrr7+uO+64Q6NHj9aGDRu0detWJSYmenPpALqQw/ri67oAAAAG4RUZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxvpf3Fyu6hcwiasAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def num_tokens_from_string(string: str) -> int:\n",
    "    \"\"\"Returns the number of tokens in a text string.\"\"\"\n",
    "    encoding = tiktoken.encoding_for_model(\"gpt-4\")\n",
    "    num_tokens = len(encoding.encode(string))\n",
    "    return num_tokens\n",
    "\n",
    "\n",
    "atomic_facts = graph.query(\"MATCH (a:AtomicFact) RETURN a.text AS text\")\n",
    "df = pd.DataFrame.from_records(\n",
    "    [{\"tokens\": num_tokens_from_string(el[\"text\"])} for el in atomic_facts]\n",
    ")\n",
    "\n",
    "sns.histplot(df[\"tokens\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c6a5fc9-61c6-49a7-9d83-e7f2da250cfb",
   "metadata": {},
   "source": [
    "Atomic facts are relatively short, with the longest being only about 50 tokens. Let’s examine a couple to get a better idea."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8b827021-8ef2-411e-a4e0-21de46d7c1a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'text': 'In 1430, Joan of Arc was captured by Burgundian forces.'},\n",
       " {'text': 'In 1920, Joan of Arc was canonized by Pope Benedict XV.'},\n",
       " {'text': 'Joan of Arc was captured by Burgundian troops on 23 May.'},\n",
       " {'text': \"Joan of Arc was put on trial by Bishop Pierre Cauchon on accusations of heresy, including blaspheming by wearing men's clothes, acting upon visions that were demonic, and refusing to submit her words and deeds to the judgment of the church.\"},\n",
       " {'text': 'Joan of Arc encouraged the French to pursue the English during the Loire Campaign, leading to a victory at Patay and allowing the French army to advance on Reims, where Charles VII was crowned king with Joan at his side.'},\n",
       " {'text': \"The film was directed by Carl Theodor Dreyer and stars Renée Jeanne Falconetti as Joan, and it is widely regarded as a landmark of cinema, especially for its production, Dreyer's direction, and Falconetti's performance.\"}]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "graph.query(\"\"\"MATCH (a:AtomicFact) \n",
    "RETURN a.text AS text\n",
    "ORDER BY size(text) ASC LIMIT 3\n",
    "UNION ALL\n",
    "MATCH (a:AtomicFact) \n",
    "RETURN a.text AS text\n",
    "ORDER BY size(text) DESC LIMIT 3\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12b1a691-92ab-4c18-b5d3-007936ccfe28",
   "metadata": {},
   "source": [
    "Let’s also examine the most frequent keywords."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4985da32-26bb-49ce-b99a-3ffafa49d86f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Axes: xlabel='key', ylabel='connections'>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAGwCAYAAACzXI8XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/TGe4hAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAxfklEQVR4nO3dfVxUdd7/8feoiHJrqAgEivf3mJoZ6pq3gV6ad5ul7apJluVNiplLWWrp0s2atleGdaWwdUlaapmZWmJgkZZra9lmJKRpq2jrBojGqPD9/dHPuZwQxBGZOfR6Ph7n8ZjzPed85zNzBnjzPd+ZsRljjAAAACyohrsLAAAAcBVBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWFYtdxdwrZWUlOjo0aPy9/eXzWZzdzkAAKACjDE6deqUwsLCVKNG2eMu1T7IHD16VBEREe4uAwAAuODIkSMKDw8vc3u1DzL+/v6SfnkiAgIC3FwNAACoiIKCAkVERDj+jpel2geZC5eTAgICCDIAAFjM5aaFMNkXAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYVi13F+BJus5+1d0lWNaeZ8e5uwQAwG8QIzIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCy3BpkkpKSFBUVpYCAAAUEBCg6OlqbN292bO/Tp49sNpvTMnnyZDdWDAAAPIlb334dHh6up556Si1btpQxRn/72980bNgw/eMf/1D79u0lSZMmTdITTzzhOMbHx8dd5QIAAA/j1iAzdOhQp/VFixYpKSlJu3btcgQZHx8fhYSEuKM8AADg4TxmjkxxcbFWr16t06dPKzo62tG+atUqNWjQQB06dFBCQoLOnDlTbj92u10FBQVOCwAAqJ7c/sm++/btU3R0tIqKiuTn56e33npL7dq1kySNHTtWTZo0UVhYmL788kvNmTNHWVlZWr9+fZn9JSYmasGCBVVVPgAAcCObMca4s4CzZ8/q8OHDys/P19q1a/XKK68oIyPDEWYutn37dvXv31/Z2dlq3rz5Jfuz2+2y2+2O9YKCAkVERCg/P18BAQHl1sJXFLiOrygAAFSmgoICBQYGXvbvt9tHZGrXrq0WLVpIkrp27ardu3fr+eef10svvVRq3+7du0tSuUHG29tb3t7e165gAADgMTxmjswFJSUlTiMqF9u7d68kKTQ0tAorAgAAnsqtIzIJCQkaNGiQGjdurFOnTik1NVXp6enaunWrcnJylJqaqsGDB6t+/fr68ssvNXPmTPXu3VtRUVHuLBsAAHgItwaZEydOaNy4cTp27JgCAwMVFRWlrVu3auDAgTpy5Ii2bdumpUuX6vTp04qIiNCoUaM0d+5cd5YMAAA8iFuDzIoVK8rcFhERoYyMjCqsBgAAWI3HzZEBAACoKIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLLcGmaSkJEVFRSkgIEABAQGKjo7W5s2bHduLioo0ZcoU1a9fX35+fho1apSOHz/uxooBAIAncWuQCQ8P11NPPaU9e/bo73//u/r166dhw4bpn//8pyRp5syZ2rhxo958801lZGTo6NGjGjlypDtLBgAAHqSWO+986NChTuuLFi1SUlKSdu3apfDwcK1YsUKpqanq16+fJCk5OVlt27bVrl27dPPNN1+yT7vdLrvd7lgvKCi4dg8AAAC4lcfMkSkuLtbq1at1+vRpRUdHa8+ePTp37pwGDBjg2KdNmzZq3Lixdu7cWWY/iYmJCgwMdCwRERFVUT4AAHADtweZffv2yc/PT97e3po8ebLeeusttWvXTrm5uapdu7bq1avntH+jRo2Um5tbZn8JCQnKz893LEeOHLnGjwAAALiLWy8tSVLr1q21d+9e5efna+3atRo/frwyMjJc7s/b21ve3t6VWCEAAPBUbg8ytWvXVosWLSRJXbt21e7du/X888/rjjvu0NmzZ5WXl+c0KnP8+HGFhIS4qVoAAOBJ3H5p6ddKSkpkt9vVtWtXeXl5KS0tzbEtKytLhw8fVnR0tBsrBAAAnsKtIzIJCQkaNGiQGjdurFOnTik1NVXp6enaunWrAgMDFRcXp/j4eAUFBSkgIEDTpk1TdHR0me9YAgAAvy1uDTInTpzQuHHjdOzYMQUGBioqKkpbt27VwIEDJUlLlixRjRo1NGrUKNntdsXExOjFF190Z8kAAMCD2Iwxxt1FXEsFBQUKDAxUfn6+AgICyt236+xXq6iq6mfPs+PcXQIAoBqp6N9vj5sjAwAAUFEEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFluDTKJiYnq1q2b/P39FRwcrOHDhysrK8tpnz59+shmszktkydPdlPFAADAk7g1yGRkZGjKlCnatWuXPvjgA507d0633nqrTp8+7bTfpEmTdOzYMcfyzDPPuKliAADgSWq58863bNnitJ6SkqLg4GDt2bNHvXv3drT7+PgoJCSkQn3a7XbZ7XbHekFBQeUUCwAAPI5HzZHJz8+XJAUFBTm1r1q1Sg0aNFCHDh2UkJCgM2fOlNlHYmKiAgMDHUtERMQ1rRkAALiPW0dkLlZSUqIZM2aoZ8+e6tChg6N97NixatKkicLCwvTll19qzpw5ysrK0vr16y/ZT0JCguLj4x3rBQUFhBkAAKopjwkyU6ZM0VdffaWPP/7Yqf3ee+913O7YsaNCQ0PVv39/5eTkqHnz5qX68fb2lre39zWvFwAAuJ9HXFqaOnWq3n33XX344YcKDw8vd9/u3btLkrKzs6uiNAAA4MHcOiJjjNG0adP01ltvKT09XU2bNr3sMXv37pUkhYaGXuPqAACAp3NrkJkyZYpSU1O1YcMG+fv7Kzc3V5IUGBiounXrKicnR6mpqRo8eLDq16+vL7/8UjNnzlTv3r0VFRXlztIBAIAHcGuQSUpKkvTLh95dLDk5WRMmTFDt2rW1bds2LV26VKdPn1ZERIRGjRqluXPnuqFaAADgadx+aak8ERERysjIqKJqAACA1XjEZF8AAABXEGQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBluRRktmzZoo8//tixvmzZMt1www0aO3asfvrpp0orDgAAoDwuBZnZs2eroKBAkrRv3z7NmjVLgwcP1sGDBxUfH1+pBQIAAJSllisHHTx4UO3atZMkrVu3TkOGDNGf//xnff755xo8eHClFggAAFAWl0ZkateurTNnzkiStm3bpltvvVWSFBQU5BipAQAAuNZcGpHp1auX4uPj1bNnT3322Wdas2aNJOnbb79VeHh4pRYIAABQFpdGZF544QXVqlVLa9euVVJSkq6//npJ0ubNmxUbG1upBQIAAJTFpRGZxo0b69133y3VvmTJkqsuCAAAoKJcCjKSVFJSouzsbJ04cUIlJSVO23r37n3VhQEAAFyOS0Fm165dGjt2rL7//nsZY5y22Ww2FRcXV0pxAAAA5XEpyEyePFk33nijNm3apNDQUNlstsquCwAA4LJcCjIHDhzQ2rVr1aJFi8quBwAAoMJcetdS9+7dlZ2dXdm1AAAAXBGXRmSmTZumWbNmKTc3Vx07dpSXl5fT9qioqEopDgAAoDwuBZlRo0ZJkiZOnOhos9lsMsYw2RcAAFQZl79rCQAAwN1cCjJNmjSp7DoAAACumMsfiJeTk6OlS5dq//79kqR27drpwQcfVPPmzSutOAAAgPK49K6lrVu3ql27dvrss88UFRWlqKgoffrpp2rfvr0++OCDyq4RAADgklwakfnTn/6kmTNn6qmnnirVPmfOHA0cOLBSigMAACiPSyMy+/fvV1xcXKn2iRMn6uuvv77qogAAACrCpSDTsGFD7d27t1T73r17FRwcfLU1AQAAVIhLl5YmTZqke++9V99995169OghScrMzNTTTz+t+Pj4Si0QAACgLC4Fmccee0z+/v5avHixEhISJElhYWGaP3++pk+fXqkFAgAAlMWlIGOz2TRz5kzNnDlTp06dkiT5+/tXamEAAACX49IcmYv5+/u7HGISExPVrVs3+fv7Kzg4WMOHD1dWVpbTPkVFRZoyZYrq168vPz8/jRo1SsePH7/asgEAQDVQ4RGZLl26KC0tTdddd506d+4sm81W5r6ff/55hfrMyMjQlClT1K1bN50/f16PPPKIbr31Vn399dfy9fWVJM2cOVObNm3Sm2++qcDAQE2dOlUjR45UZmZmRUsHAADVVIWDzLBhw+Tt7e24XV6QqagtW7Y4raekpCg4OFh79uxR7969lZ+frxUrVig1NVX9+vWTJCUnJ6tt27batWuXbr755lJ92u122e12x3pBQcFV1wkAADxThYPMvHnzHLfnz59/LWpRfn6+JCkoKEiStGfPHp07d04DBgxw7NOmTRs1btxYO3fuvGSQSUxM1IIFC65JfQAAwLO4NEemWbNmOnnyZKn2vLw8NWvWzKVCSkpKNGPGDPXs2VMdOnSQJOXm5qp27dqqV6+e076NGjVSbm7uJftJSEhQfn6+Yzly5IhL9QAAAM/n0ruWDh06pOLi4lLtdrtdP/zwg0uFTJkyRV999ZU+/vhjl46/wNvb23EJDAAAVG9XFGTeeecdx+2tW7cqMDDQsV5cXKy0tDQ1bdr0iouYOnWq3n33Xe3YsUPh4eGO9pCQEJ09e1Z5eXlOozLHjx9XSEjIFd8PAACoXq4oyAwfPlzSL58jM378eKdtXl5eioyM1OLFiyvcnzFG06ZN01tvvaX09PRSIahr167y8vJSWlqaRo0aJUnKysrS4cOHFR0dfSWlAwCAauiKgkxJSYkkqWnTptq9e7caNGhwVXc+ZcoUpaamasOGDfL393fMewkMDFTdunUVGBiouLg4xcfHKygoSAEBAZo2bZqio6MvOdEXAAD8trg0R+bgwYOVcudJSUmSpD59+ji1Jycna8KECZKkJUuWqEaNGho1apTsdrtiYmL04osvVsr9AwAAa3MpyEyfPl0tWrQo9b1KL7zwgrKzs7V06dIK9WOMuew+derU0bJly7Rs2TJXSgUAANWYS2+/XrdunXr27FmqvUePHlq7du1VFwUAAFARLgWZkydPOr1j6YKAgAD9+9//vuqiAAAAKsKlINOiRYtSXy8gSZs3b3b5A/EAAACulEtzZOLj4zV16lT9+OOPju9ASktL0+LFiys8PwYAAOBquRRkJk6cKLvdrkWLFunJJ5+UJEVGRiopKUnjxo2r1AIBAADK4lKQkaT7779f999/v3788UfVrVtXfn5+lVkXAADAZbk0R0aSzp8/r23btmn9+vWOt1EfPXpUhYWFlVYcAABAeVwakfn+++8VGxurw4cPy263a+DAgfL399fTTz8tu92u5cuXV3adAAAApbg0IvPggw/qxhtv1E8//aS6des62keMGKG0tLRKKw4AAKA8Lo3IfPTRR/rkk09Uu3Ztp/bIyEj961//qpTCAAAALselEZmSkhIVFxeXav/hhx/k7+9/1UUBAABUhEtB5tZbb3X6vBibzabCwkLNmzdPgwcPrqzaAAAAyuXSpaXFixcrJiZG7dq1U1FRkcaOHasDBw6oQYMGev311yu7RgAAgEtyKciEh4friy++0OrVq/Xll1+qsLBQcXFxuuuuu5wm/wIAAFxLLn8gXq1atfSHP/yhMmsBAAC4Ii4HmQMHDujDDz/UiRMnVFJS4rTt8ccfv+rCAAAALselIPM///M/uv/++9WgQQOFhITIZrM5ttlsNoIMAACoEi4FmYULF2rRokWaM2dOZdcDAABQYS69/fqnn37S7bffXtm1AAAAXBGXgsztt9+u999/v7JrAQAAuCIuXVpq0aKFHnvsMe3atUsdO3aUl5eX0/bp06dXSnEAAADlcSnIvPzyy/Lz81NGRoYyMjKcttlsNoIMAACoEi4FmYMHD1Z2HQAAAFfMpTkyAAAAnsClEZni4mKlpKQoLS3tkh+It3379kopDgAAoDwuBZkHH3xQKSkp+q//+i916NDB6QPxAAAAqopLQWb16tV64403NHjw4MquBwAAoMJcmiNTu3ZttWjRorJrAQAAuCIuBZlZs2bp+eeflzGmsusBAACoMJcuLX388cf68MMPtXnzZrVv377UB+KtX7++UooDAAAoj0tBpl69ehoxYkRl1wIAAHBFXAoyycnJlV0HAADAFXMpyFzw448/KisrS5LUunVrNWzYsFKKAgAAqAiXJvuePn1aEydOVGhoqHr37q3evXsrLCxMcXFxOnPmTGXXCAAAcEkuBZn4+HhlZGRo48aNysvLU15enjZs2KCMjAzNmjWrsmsEAAC4JJcuLa1bt05r165Vnz59HG2DBw9W3bp1NXr0aCUlJVVWfQAAAGVyaUTmzJkzatSoUan24OBgLi0BAIAq41KQiY6O1rx581RUVORo+/nnn7VgwQJFR0dXuJ8dO3Zo6NChCgsLk81m09tvv+20fcKECbLZbE5LbGysKyUDAIBqyKVLS0uXLlVsbKzCw8PVqVMnSdIXX3whb29vvf/++xXu5/Tp0+rUqZMmTpyokSNHXnKf2NhYp7d7e3t7u1IyAACohlwKMh07dtSBAwe0atUqffPNN5KkMWPG6K677lLdunUr3M+gQYM0aNCgcvfx9vZWSEiIK2UCAIBqzqUgk5iYqEaNGmnSpElO7StXrtSPP/6oOXPmVEpxkpSenq7g4GBdd9116tevnxYuXKj69euXub/dbpfdbnesFxQUVFotAADAs7g0R+all15SmzZtSrW3b99ey5cvv+qiLoiNjdWrr76qtLQ0Pf3008rIyNCgQYNUXFxc5jGJiYkKDAx0LBEREZVWDwAA8Cwujcjk5uYqNDS0VHvDhg117Nixqy7qgjvvvNNxu2PHjoqKilLz5s2Vnp6u/v37X/KYhIQExcfHO9YLCgoIMwAAVFMujchEREQoMzOzVHtmZqbCwsKuuqiyNGvWTA0aNFB2dnaZ+3h7eysgIMBpAQAA1ZNLIzKTJk3SjBkzdO7cOfXr10+SlJaWpocffviafrLvDz/8oJMnT15yNAgAAPz2uBRkZs+erZMnT+qBBx7Q2bNnJUl16tTRnDlzlJCQUOF+CgsLnUZXDh48qL179yooKEhBQUFasGCBRo0apZCQEOXk5Ojhhx9WixYtFBMT40rZAACgmrEZY4yrBxcWFmr//v2qW7euWrZsecWf8ZKenq6+ffuWah8/frySkpI0fPhw/eMf/1BeXp7CwsJ066236sknn7zkpwqXpaCgQIGBgcrPz7/sZaaus1+9ovrxf/Y8O87dJQAAqpGK/v12aUTmAj8/P3Xr1s3l4/v06aPyctTWrVtd7hsAAFR/Lk32BQAA8AQEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFm13F0AcCldZ7/q7hIsa8+z49xdAgBUGUZkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZRFkAACAZbk1yOzYsUNDhw5VWFiYbDab3n77baftxhg9/vjjCg0NVd26dTVgwAAdOHDAPcUCAACP49Ygc/r0aXXq1EnLli275PZnnnlGf/3rX7V8+XJ9+umn8vX1VUxMjIqKiqq4UgAA4IlqufPOBw0apEGDBl1ymzFGS5cu1dy5czVs2DBJ0quvvqpGjRrp7bff1p133lmVpQIAAA/ksXNkDh48qNzcXA0YMMDRFhgYqO7du2vnzp1lHme321VQUOC0AACA6sljg0xubq4kqVGjRk7tjRo1cmy7lMTERAUGBjqWiIiIa1onAABwH48NMq5KSEhQfn6+Yzly5Ii7SwIAANeIxwaZkJAQSdLx48ed2o8fP+7Ydine3t4KCAhwWgAAQPXksUGmadOmCgkJUVpamqOtoKBAn376qaKjo91YGQAA8BRufddSYWGhsrOzHesHDx7U3r17FRQUpMaNG2vGjBlauHChWrZsqaZNm+qxxx5TWFiYhg8f7r6iAQCAx3BrkPn73/+uvn37Otbj4+MlSePHj1dKSooefvhhnT59Wvfee6/y8vLUq1cvbdmyRXXq1HFXyQAAwIO4Ncj06dNHxpgyt9tsNj3xxBN64oknqrAqAABgFR47RwYAAOByCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyarm7AACerevsV91dgqXteXacu0sAqjVGZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGXxyb4AYBF8yvLVqexPWeZ8uK4yzwUjMgAAwLIIMgAAwLIIMgAAwLIIMgAAwLIIMgAAwLI8OsjMnz9fNpvNaWnTpo27ywIAAB7C499+3b59e23bts2xXquWx5cMAACqiMenglq1aikkJMTdZQAAAA/k0ZeWJOnAgQMKCwtTs2bNdNddd+nw4cPl7m+321VQUOC0AACA6smjg0z37t2VkpKiLVu2KCkpSQcPHtTvfvc7nTp1qsxjEhMTFRgY6FgiIiKqsGIAAFCVPDrIDBo0SLfffruioqIUExOj9957T3l5eXrjjTfKPCYhIUH5+fmO5ciRI1VYMQAAqEoeP0fmYvXq1VOrVq2UnZ1d5j7e3t7y9vauwqoAAIC7ePSIzK8VFhYqJydHoaGh7i4FAAB4AI8OMg899JAyMjJ06NAhffLJJxoxYoRq1qypMWPGuLs0AADgATz60tIPP/ygMWPG6OTJk2rYsKF69eqlXbt2qWHDhu4uDQAAeACPDjKrV692dwkAAMCDefSlJQAAgPIQZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGVZIsgsW7ZMkZGRqlOnjrp3767PPvvM3SUBAAAP4PFBZs2aNYqPj9e8efP0+eefq1OnToqJidGJEyfcXRoAAHAzjw8yzz33nCZNmqS7775b7dq10/Lly+Xj46OVK1e6uzQAAOBmtdxdQHnOnj2rPXv2KCEhwdFWo0YNDRgwQDt37rzkMXa7XXa73bGen58vSSooKLjs/RXbf77Kin+7KvL8XgnOhes4F56lMs8H5+Lq8LPhOSpyLi7sY4wpf0fjwf71r38ZSeaTTz5xap89e7a56aabLnnMvHnzjCQWFhYWFhaWarAcOXKk3Kzg0SMyrkhISFB8fLxjvaSkRP/5z39Uv3592Ww2N1Z2dQoKChQREaEjR44oICDA3eX8pnEuPAfnwnNwLjxHdTkXxhidOnVKYWFh5e7n0UGmQYMGqlmzpo4fP+7Ufvz4cYWEhFzyGG9vb3l7ezu11atX71qVWOUCAgIs/cKsTjgXnoNz4Tk4F56jOpyLwMDAy+7j0ZN9a9eura5duyotLc3RVlJSorS0NEVHR7uxMgAA4Ak8ekRGkuLj4zV+/HjdeOONuummm7R06VKdPn1ad999t7tLAwAAbubxQeaOO+7Qjz/+qMcff1y5ubm64YYbtGXLFjVq1MjdpVUpb29vzZs3r9RlM1Q9zoXn4Fx4Ds6F5/itnQubMZd7XxMAAIBn8ug5MgAAAOUhyAAAAMsiyAAAAMsiyFhcbm6uBg4cKF9f32r1eTlVzRije++9V0FBQbLZbKpXr55mzJjh2B4ZGamlS5e6rT7AKlJSUpx+F82fP1833HBDhY69kn2BCwgyZZgwYYKGDx/u7jIua8mSJTp27Jj27t2rb7/9ttx9f/jhB9WuXVsdOnSoouqsY8uWLUpJSdG7776rY8eO6dtvv9WTTz7p7rKqlQkTJshms5VasrOz3V3ab0ZZ5yA2Nvaa3edDDz3k9FlgcHbxOfHy8lKjRo00cOBArVy5UiUlJe4uzxIIMhaXk5Ojrl27qmXLlgoODi5335SUFI0ePVoFBQX69NNPL9v3uXPnKqtMj5eTk6PQ0FD16NFDISEhCg4Olr+/v7vLqnZiY2N17Ngxp6Vp06ZO+5w9e9ZN1f02XOocvP7669fs/vz8/FS/fv1r1n91cOGcHDp0SJs3b1bfvn314IMPasiQITp//vwlj3HX72djTJk1uQtBpgLsdrumT5+u4OBg1alTR7169dLu3bsd24uLixUXF6emTZuqbt26at26tZ5//nmnPi6M8PzlL39RaGio6tevrylTplz2xZiUlKTmzZurdu3aat26tV577TXHtsjISK1bt06vvvqqbDabJkyYUGY/xhglJyfrj3/8o8aOHasVK1Y4bT906JBsNpvWrFmjW265RXXq1NGqVaskSStXrlT79u3l7e2t0NBQTZ06taJPnSVMmDBB06ZN0+HDh2Wz2RQZGak+ffo4XVr6NZvNppdeeklDhgyRj4+P2rZtq507dyo7O1t9+vSRr6+vevTooZycnKp7IBbg7e2tkJAQp6V///6aOnWqZsyYoQYNGigmJkaS9Nxzz6ljx47y9fVVRESEHnjgARUWFjr6unAJY+vWrWrbtq38/PwcfxAuVt7rNy8vT/fcc48aNmyogIAA9evXT1988UXVPBlucqlzcN1110n65XX9yiuvaMSIEfLx8VHLli31zjvvOB3/zjvvqGXLlqpTp4769u2rv/3tb7LZbMrLy7vk/f36clF6erpuuukmx+Xwnj176vvvv3c65rXXXlNkZKQCAwN155136tSpU5X6HHiaC+fk+uuvV5cuXfTII49ow4YN2rx5s1JSUiT9cm6SkpJ02223ydfXV4sWLZIkbdiwQV26dFGdOnXUrFkzLViwwBE0Jk6cqCFDhjjd17lz5xQcHOz4G1BSUqLExETH369OnTpp7dq1jv3T09Nls9m0efNmde3aVd7e3vr444+r4Fm5ApXwJdXV0vjx482wYcOMMcZMnz7dhIWFmffee8/885//NOPHjzfXXXedOXnypDHGmLNnz5rHH3/c7N6923z33Xfmf//3f42Pj49Zs2aNU38BAQFm8uTJZv/+/Wbjxo3Gx8fHvPzyy2XWsH79euPl5WWWLVtmsrKyzOLFi03NmjXN9u3bjTHGnDhxwsTGxprRo0ebY8eOmby8vDL7SktLMyEhIeb8+fNm3759xt/f3xQWFjq2Hzx40EgykZGRZt26dea7774zR48eNS+++KKpU6eOWbp0qcnKyjKfffaZWbJkyVU8s54nLy/PPPHEEyY8PNwcO3bMnDhxwtxyyy3mwQcfdOzTpEkTp8ctyVx//fVmzZo1JisrywwfPtxERkaafv36mS1btpivv/7a3HzzzSY2NrbqH5CHuvhn6mK33HKL8fPzM7NnzzbffPON+eabb4wxxixZssRs377dHDx40KSlpZnWrVub+++/33FccnKy8fLyMgMGDDC7d+82e/bsMW3btjVjx4517HO51++AAQPM0KFDze7du823335rZs2aZerXr+/42a5uyjoHF0gy4eHhJjU11Rw4cMBMnz7d+Pn5OZ6P7777znh5eZmHHnrIfPPNN+b11183119/vZFkfvrpJ2PML+clMDDQ0ee8efNMp06djDHGnDt3zgQGBpqHHnrIZGdnm6+//tqkpKSY77//3rGvn5+fGTlypNm3b5/ZsWOHCQkJMY888si1eDo8QnnnpFOnTmbQoEHGmF/OTXBwsFm5cqXJyckx33//vdmxY4cJCAgwKSkpJicnx7z//vsmMjLSzJ8/3xhjTGZmpqlZs6Y5evSoo8/169cbX19fc+rUKWOMMQsXLjRt2rQxW7ZsMTk5OSY5Odl4e3ub9PR0Y4wxH374oZFkoqKizPvvv2+ys7M97ueDIFOGCy+uwsJC4+XlZVatWuXYdvbsWRMWFmaeeeaZMo+fMmWKGTVqlFN/TZo0MefPn3e03X777eaOO+4os48ePXqYSZMmObXdfvvtZvDgwY71YcOGmfHjx1/28YwdO9bMmDHDsd6pUyeTnJzsWL8QZJYuXep0XFhYmHn00Ucv27/VLVmyxDRp0sSxXpEgM3fuXMf6zp07jSSzYsUKR9vrr79u6tSpcy3LtpTx48ebmjVrGl9fX8fy+9//3txyyy2mc+fOlz3+zTffNPXr13esJycnG0kmOzvb0bZs2TLTqFEjx3p5r9+PPvrIBAQEmKKiIqf25s2bm5deeulKH54lXOoc+Pr6mkWLFhljSr+uCwsLjSSzefNmY4wxc+bMMR06dHDq89FHH61wkDl58qSR5Pgj+Wvz5s0zPj4+pqCgwNE2e/Zs071796t96B6rvCBzxx13mLZt2xpjfjk3F/8ON8aY/v37mz//+c9Oba+99poJDQ11rLdr1848/fTTjvWhQ4eaCRMmGGOMKSoqMj4+PuaTTz5x6iMuLs6MGTPGGPN/Qebtt9927QFWAY//igJ3y8nJ0blz59SzZ09Hm5eXl2666Sbt37/f0bZs2TKtXLlShw8f1s8//6yzZ8+Wmn3fvn171axZ07EeGhqqffv2lXnf+/fv17333uvU1rNnz1KXrS4nLy9P69evdxoO/MMf/qAVK1aUuhx14403Om6fOHFCR48eVf/+/a/o/n4roqKiHLcvfGVGx44dndqKiopUUFBg+W+grSx9+/ZVUlKSY93X11djxoxR165dS+27bds2JSYm6ptvvlFBQYHOnz+voqIinTlzRj4+PpIkHx8fNW/e3HFMaGioTpw4Ienyr98vvvhChYWFpeZv/Pzzz9X6kuCvz4EkBQUFOW5f/Lr29fVVQECA4znNyspSt27dnI696aabKnzfQUFBmjBhgmJiYjRw4EANGDBAo0ePVmhoqGOfyMhIp/lpF5/T3xpjjGw2m2P94t/P0i+v4czMTMdlJumXqQ4X/5zcc889evnll/Xwww/r+PHj2rx5s7Zv3y5Jys7O1pkzZzRw4ECnfs+ePavOnTs7tf36vj0JQaYSrF69Wg899JAWL16s6Oho+fv769lnny01odbLy8tp3WazVcms9NTUVBUVFal79+6ONmOMSkpK9O2336pVq1aOdl9fX8ftunXrXvParOzi83nhl82l2njnwf/x9fVVixYtLtl+sUOHDmnIkCG6//77tWjRIgUFBenjjz9WXFyczp496wgyl/qZMv//W1cu9/otLCxUaGio0tPTS22rzh9lUNY5uOBa/55KTk7W9OnTtWXLFq1Zs0Zz587VBx98oJtvvrlK7t9K9u/f7zQZ/tc/J4WFhVqwYIFGjhxZ6tg6depIksaNG6c//elP2rlzpz755BM1bdpUv/vd7xzHS9KmTZt0/fXXOx3/6+9p+vV9exKCzGVcmGibmZmpJk2aSPplstTu3bsdk0EzMzPVo0cPPfDAA47jKuM/urZt2yozM1Pjx493tGVmZqpdu3ZX1M+KFSs0a9asUqMvDzzwgFauXKmnnnrqksf5+/srMjJSaWlp6tu37xXXD7hqz549Kikp0eLFi1Wjxi/vSXjjjTeuqI/LvX67dOmi3Nxc1apVS5GRkZVRdrXXunVrvffee05tF7/xoaI6d+6szp07KyEhQdHR0UpNTXUEGfxi+/bt2rdvn2bOnFnmPl26dFFWVla5wbR+/foaPny4kpOTtXPnTt19992Obe3atZO3t7cOHz6sW265pVLrr0oEmcvw9fXV/fffr9mzZysoKEiNGzfWM888ozNnziguLk6S1LJlS7366qvaunWrmjZtqtdee027d+8u9bbSKzV79myNHj1anTt31oABA7Rx40atX79e27Ztq3Afe/fu1eeff65Vq1apTZs2TtvGjBmjJ554QgsXLizz+Pnz52vy5MkKDg7WoEGDdOrUKWVmZmratGkuPy7gclq0aKFz587pv//7vzV06FBlZmZq+fLlV9xPea/fAQMGKDo6WsOHD9czzzyjVq1a6ejRo9q0aZNGjBjh0UPpV8Nutys3N9eprVatWmrQoMFlj73vvvv03HPPac6cOYqLi9PevXud3lVzOQcPHtTLL7+s2267TWFhYcrKytKBAwc0btw4lx5LdXHhnBQXF+v48ePasmWLEhMTNWTIkHKfm8cff1xDhgxR48aN9fvf/141atTQF198oa+++srp9/o999yjIUOGqLi42OkfY39/fz300EOaOXOmSkpK1KtXL+Xn5yszM1MBAQFO+3oy3n5dhpKSEtWq9UvOe+qppzRq1Cj98Y9/VJcuXZSdna2tW7c63rJ43333aeTIkbrjjjvUvXt3nTx50ml0xlXDhw/X888/r7/85S9q3769XnrpJSUnJ6tPnz4V7mPFihVq165dqRAjSSNGjNCJEydK/Yd1sfHjx2vp0qV68cUX1b59ew0ZMkQHDhxw5eEAFdapUyc999xzevrpp9WhQwetWrVKiYmJV9xPea9fm82m9957T71799bdd9+tVq1a6c4779T333/vmPNUHW3ZskWhoaFOS69evSp0bNOmTbV27VqtX79eUVFRSkpK0qOPPiqp9KWIS/Hx8dE333yjUaNGqVWrVrr33ns1ZcoU3XfffVf1mKzuwjmJjIxUbGysPvzwQ/31r3/Vhg0bnOZV/lpMTIzeffddvf/+++rWrZtuvvlmLVmyxHH14IIBAwYoNDRUMTExCgsLc9r25JNP6rHHHlNiYqLatm2r2NhYbdq06ar/Ea9KNnPhgjKcxMbGqkWLFnrhhRfcXQoAeKxFixZp+fLlOnLkiLtLQRkKCwt1/fXXKzk5+ZLzaayOS0u/8tNPPykzM1Pp6emaPHmyu8sBAI/y4osvqlu3bqpfv74yMzP17LPPVrsPyawuSkpK9O9//1uLFy9WvXr1dNttt7m7pGuCIPMrEydO1O7duzVr1iwNGzbM3eUAgEc5cOCAFi5cqP/85z9q3LixZs2apYSEBHeXhUs4fPiwmjZtqvDwcKWkpDimS1Q3XFoCAACWxWRfAABgWQQZAABgWQQZAABgWQQZAABgWQQZAABgWQQZAB6nT58+ju8yA4DyEGQAAIBlEWQAAIBlEWQAeLxNmzYpMDBQq1at0pEjRzR69GjVq1dPQUFBGjZsmA4dOiRJ2rFjh7y8vEp9u/OMGTP0u9/9zg2VA7jWCDIAPFpqaqrGjBmjVatWafTo0YqJiZG/v78++ugjZWZmys/PT7GxsTp79qx69+6tZs2a6bXXXnMcf+7cOa1atUoTJ05046MAcK0QZAB4rGXLlumBBx7Qxo0bNWTIEK1Zs0YlJSV65ZVX1LFjR7Vt21bJyck6fPiw0tPTJUlxcXFKTk529LFx40YVFRVp9OjRbnoUAK6l6vkNUgAsb+3atTpx4oQyMzPVrVs3SdIXX3yh7Oxs+fv7O+1bVFSknJwcSdKECRM0d+5c7dq1SzfffLNSUlI0evRo+fr6VvljAHDtEWQAeKTOnTvr888/18qVK3XjjTfKZrOpsLBQXbt21apVq0rt37BhQ0lScHCwhg4dquTkZDVt2lSbN292jNYAqH4IMgA8UvPmzbV48WL16dNHNWvW1AsvvKAuXbpozZo1Cg4OVkBAQJnH3nPPPRozZozCw8PVvHlz9ezZsworB1CVmCMDwGO1atVKH374odatW6cZM2borrvuUoMGDTRs2DB99NFHOnjwoNLT0zV9+nT98MMPjuNiYmIUEBCghQsX6u6773bjIwBwrRFkAHi01q1ba/v27Xr99df12GOPaceOHWrcuLFGjhyptm3bKi4uTkVFRU4jNDVq1NCECRNUXFyscePGubF6ANeazRhj3F0EAFS2uLg4/fjjj3rnnXfcXQqAa4g5MgCqlfz8fO3bt0+pqamEGOA3gCADoFoZNmyYPvvsM02ePFkDBw50dzkArjEuLQEAAMtisi8AALAsggwAALAsggwAALAsggwAALAsggwAALAsggwAALAsggwAALAsggwAALCs/wfakrBWl1BoegAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "data = graph.query(\"\"\"\n",
    "MATCH (a:KeyElement) \n",
    "RETURN a.id AS key, \n",
    "       count{(a)<-[:HAS_KEY_ELEMENT]-()} AS connections\n",
    "ORDER BY connections DESC LIMIT 5\"\"\")\n",
    "df = pd.DataFrame.from_records(data)\n",
    "sns.barplot(df, x='key', y='connections')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71e86da2-d4e6-4d98-a365-a2fa4046dd05",
   "metadata": {},
   "source": [
    "Unsurprisingly, Joan of Arc is the most mentioned keyword or element. Following are broad keywords like film, English, and France. I suspect that if we parsed many documents, broad keywords would end up having a lot of connections, which might lead to some downstream problems that aren’t dealt with in the original implementation. Another minor problem is the non-determinism of the extraction, as the results will be slight different on every run.\n",
    "\n",
    "Additionally, the authors employ key element normalization as described in Lu et al. (2023), specifically using frequency filtering, rule, semantic, and association aggregation. In this implementation, we skipped this step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7350870a-d975-4fe6-bde4-659ec210bd49",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
